SIMD-0177: Program Runtime ABI v2#177
Conversation
ae4514d to
7148fdf
Compare
| - Readonly instruction accounts get no growth padding. | ||
| - For writable instruction accounts additional capacity is allocated and mapped | ||
| for potential account growth. The maximum capacity is the length of the account | ||
| payload at the beginning of the transaction plus 10 KiB. CPI can not grow |
There was a problem hiding this comment.
does this get affected by this other SIMD?
https://github.com/solana-foundation/solana-improvement-documents/pull/163/files
There was a problem hiding this comment.
They are independent. SIMD-0163 is about the program being called, that is not affected in this SIMD.
There was a problem hiding this comment.
@nickfrosty Fwiw, this means the realloc limit is unchanged as well.
There was a problem hiding this comment.
We could increase the account realloc / resize limit if there are no interactions of ABI v0/v1 and ABI v2 programs in the CPI call tree of a top level instruction. See the discussion with Sean below.
7148fdf to
375deef
Compare
375deef to
1057208
Compare
1057208 to
51f9f75
Compare
caa91dd to
6b641c8
Compare
| - Magic: `u32`: `0x76494241` ("ABIv" encoded in ASCII) | ||
| - ABI version `u32`: `0x00000002` | ||
| - Pointer to instruction data: `u64` | ||
| - Length of instruction data: `u32` |
There was a problem hiding this comment.
I was thinking about this and I thought that if data was presented in a way which makes sense to rust, e.g. regular slices with u64 ptr and u64 length, then rust programs do not have to do any entry processing at all, and can just cast 4GiB address to a type and be done.
0d9976c to
9084d79
Compare
9084d79 to
e3e3e19
Compare
| - Key: `[u8; 32]` | ||
| - Owner: `[u8; 32]` | ||
| - Lamports: `u64` | ||
| - Account payload: `&[u8]` which is composed of: | ||
| - Pointer to account payload: `u64` | ||
| - Account payload length: `u64` |
There was a problem hiding this comment.
Programs also have access to the booleans writable, signer and executable. Are we serializing these ones as well?
There was a problem hiding this comment.
These are per instruction not per transaction. See the "flags bitfield" in "Per Instruction Serialization".
| - For each transaction account: | ||
| - Key: `[u8; 32]` | ||
| - Owner: `[u8; 32]` | ||
| - Lamports: `u64` |
There was a problem hiding this comment.
After a couple of discussions with @Lichtso, we thought the feedback from developer relations would be important here.
Today programs can only see the accounts passed to them in the instruction being executed. This layout change entails that programs (and every CPIs program invoked from them) will now be able to access metadata from all the accounts passed in the transaction, regardless whether they were passed in the instruction or not. We still intend to maintain the account payload hidden, though, if it is not an instruction account.
Would this change have any unintended consequences on the developer side?
(cc. @joncinque and @jacobcreech )
| The `AccountInfo` parameter of the CPI syscalls (`sol_invoke_signed_c` and | ||
| `sol_invoke_signed_rust`) will be ignored if ABI v2 is in use. Instead the | ||
| changes to account metadata will be communicated explicitly through separate | ||
| syscalls `sol_set_account_owner`, `sol_set_account_lamports` and |
There was a problem hiding this comment.
Perhaps we need to mention the expected cost of sol_set_account_lamports to update lamports of an account – this is a quite common operation in programs.
|
In general the SIMD still needs to define the CU charging for the four syscalls and for the number of instruction accounts. |
|
|
||
| The runtime must only map the payload for accounts that belong in the current | ||
| executing instruction. The payload for accounts belonging to sibling instructions | ||
| must NOT be mapped. |
There was a problem hiding this comment.
It might be easier to always map in all accounts which are not referenced in an instruction as readonly. That way we wouldn't even have to hide / reveal them on every instruction, thus it is less work for the validator and more available data for the programs.
Also, we already load all sysvar accounts, might as well expose them here too. That would however either rise the maximum transaction account number beyond 255 or require a new range of transaction accounts, but that is harder to pull of because of possible aliasing with sysvars which were mentioned in the message.
Co-authored-by: Alex Kahn <43892045+alnoki@users.noreply.github.com>
|
A comment on the status of this proposal: We are developing the infrastructure on the validator to accommodate all the necessary changes for ABIv2, regardless of what data layout we choose. Once that a is ready, this is the plan:
|
should we kick this one back to a SIMD discussion for cataloging this discovery information and open up a fresh SIMD once we're ready to quibble over data layout and specifics? |
@bw-solana yes please! Plenty of folks like myself are deep in VM-level implementation mechanics and would like a chance to talk through things with a fresh context based on this current revision as discovery information |
We already have a prototype with the roughly the layout presented here.
This one has accumulated clutter and comments on historic designs, which are now outdated, that is true.
Open to that, but wanted to note that the discussion page has no central document to work on a spec in whole, which is important as all parts need to stay consistent and compatible with one another. |
| the CPI scratchpad. At the beginning of every instruction, these scratchpads | ||
| must be empty and their size must be zero. | ||
|
|
||
| Programs must set the desired length for them using the `set_buffer_length` |
There was a problem hiding this comment.
Since it is possible to statically determine the maximum length for the return-data scratchpad, one alternative is to always have MAX_RETURN_DATA bytes allocated. Programs then write as many bytes as they need, prefixing with the length if needed, without having to set the length. This might save a syscall call on the program side.
|
|
||
| ### VM initialization | ||
|
|
||
| During the initilization of the virtual machine, the runtime must load the |
There was a problem hiding this comment.
Couple of ideas for this:
- We could also add the pointer to the instruction data (R4) and length of it (R5).
- I wonder if having only R1 is enough – the other pointers and lengths can be computed with a static offset from R1, so we can "save" those register to other use.
Co-authored-by: febo <febo@anza.xyz>
Co-authored-by: Simonas Kazlauskas <github@kazlauskas.me>
| The `set_buffer_length` must charge a base cost (to be determined) plus the | ||
| same CU per byte ratio as the `memset` syscall. |
There was a problem hiding this comment.
While implementing this I found the specification to be insufficient here. The obvious interpretation would be to charge the per-byte fee for the additional bytes that are being allocated (and not at all if the buffer size is being reduced) as those are the only bytes that need memsetting to 0.
At the same time, any increase in buffer size may require a realloc and memcpy from previous buffer to the new one. So charging per_byte * new_length any time the buffer size would increase (new_length > current_length) would seem like a more correct option. I think we can get away with a single fee here, as the cost of memcpy and memset is in roughly the same ballpark.
Question is: are we okay with charging a significant fee when the buffer is being resized? This fee can be especially painful to resizes where the base buffer is large and is only increased in size by a little bit every time, but it would correctly reflect the computation load.
EDIT: Alternative to charging for reallocations would be to maintain a pre-allocated pool of buffers such that each region already gets a buffer that's at least as large as needed to contain the largest region requestable. I don't think that's feasible, though, especially not an outlook of having buffers larger than 10MiB.
There was a problem hiding this comment.
So charging per_byte * new_length any time the buffer size would increase (new_length > current_length) would seem like a more correct option. I think we can get away with a single fee here, as the cost of memcpy and memset is in roughly the same ballpark.
I believe charging for the entire new size during account growths is the right solution for now. If we can come up with another implementation on the validator side that allows us to decrease costs, we can do so later.
Decreasing costs is always easier than increasing them.
| - Index to transaction account: `u16` | ||
| - Signer flag: `u8` (1 for signer, 0 for non-signer) | ||
| - Writable flag: `u8` (1 for writable, 0 for readonly) | ||
| - Pointer to the account metadata: `u64` (see [Account metadata area] |
There was a problem hiding this comment.
@LucasSte We need to add some padding bytes ([u8; 4]) here to the u64 value is aligned.
There was a problem hiding this comment.
Hm, not sure using the VM ptr is the right thing here to begin with. This originally was a u16 index, which is what the program runtime internally uses. The runtime does not work on VM ptrs and that would also make no sense as converting them back is the same amount of work as converting them forward. This is another case of the same work either being on the program runtime side or the program side. The VM ptr approach does increase memory traffic 4 times.
There was a problem hiding this comment.
The index in transaction is still there, so this change is not suggesting the runtime to use the VM pointer. The VM pointer is easily calculated by runtime and surfacing this information to programs is beneficial, since they save CUs, when accessing the account metadata.
The runtime does not work on VM ptrs and that would also make no sense as converting them back is the same amount of work as converting them forward
The pointer is just base address + idx*region_size. What am I missing?
The VM ptr approach does increase memory traffic 4 times.
Can you elaborate more on this?
There was a problem hiding this comment.
The pointer is just base address + idx*region_size
Exactly, the conversion is very simple, programs can do it on their own.
Can you elaborate more on this?
Storing a 64 bit pointer bloats the instruction account structure and means a whole lot more data must be written (and read) per instruction (also caused by additional padding requirements).
Other VMs also have to deal with this general problem, key word is "pointer compression":
There was a problem hiding this comment.
Exactly, the conversion is very simple, programs can do it on their own.
Likewise, runtime can do it for free.
Storing a 64 bit pointer bloats the instruction account structure and means a whole lot more data must be written (and read) per instruction (also caused by additional padding requirements).
I don't think 16 bytes (in total per account) is anything to worry about. We write a lot more than that for other data structures. Perhaps, the impact of this change can be benchmarked to determine if it harms runtime performance.
We could try decreasing the page alignment from 4GB to 2GB or 1GB, so that addresses fit in a u32.
There was a problem hiding this comment.
As discussed, this won't be enough savings to get to 32 bit pointers. You would still end up with a 6 byte pointer, which is a thing on some architectures (as they ignore the high bits).
There was a problem hiding this comment.
Right, I'll remove the pointer field for now.
Greptile SummaryThis PR introduces SIMD-0177, a proposal for a new Program Runtime ABI (v2) that redesigns the sBPF virtual address space to use large-page-aligned regions, enabling direct mapping of account payloads, instruction data, and return data, and replacing per-instruction serialization with shared, transaction-scoped memory areas.
Confidence Score: 3/5This proposal documents an important and complex ABI change but has several gaps that need resolution before the design is complete enough to implement safely. The Security Considerations section is entirely unfilled for an ABI that introduces shared transaction-scoped memory regions and new privilege-check paths. The prerequisite (SIMD-0219) doesn't exist in the repository yet. The address layout imposes a hard cap of roughly 63 total instructions (top-level + CPI combined) that goes unacknowledged, and the CPI-return scratchpad advancement rule is ambiguous enough that two implementors could produce incompatible behavior for programs making sequential CPIs. proposals/0177-program-runtime-abiv2.md — the Security Considerations, CPI return logic, and instruction-count cap sections all need substantive content before this proposal is ready for broader review. Important Files Changed
Sequence DiagramsequenceDiagram
participant RT as Runtime
participant TxMeta as TxMetadata (0x400000000)
participant AccMeta as AccMetadata (0x500000000)
participant InstrArea as InstrArea (0x600000000)
participant Prog as ABIv2 Program
RT->>TxMeta: Initialize (return-data ptr, CPI scratchpad ptr, instr index, account count)
RT->>AccMeta: Initialize (key, owner, lamports, payload ptr per account)
RT->>InstrArea: Initialize instruction entries (program idx, CPI level, accounts ref, data ref)
RT->>Prog: "Execute (R1=instr meta, R2=acct slice, R3=acct count, R4=instr data, R5=data len)"
Prog->>Prog: set_buffer_length(CPI scratchpad base, new_len)
Prog->>Prog: "Write CPI accounts to 0x14800000000 + N*0x100000000"
Prog->>Prog: "Write CPI data to 0x10800000000 + N*0x100000000"
Prog->>RT: sol_invoke_signed_v2(program_idx, signer_seeds_ptr, seeds_len)
RT->>RT: Verify account indexes and privilege flags
RT->>InstrArea: Append new instruction entry
RT->>TxMeta: Update CPI scratchpad ptr, executing instr index, instr count
RT->>Prog: Execute callee (ABIv2)
Prog-->>RT: Return
RT->>TxMeta: Advance CPI scratchpad ptr (+0x100000000)
RT->>AccMeta: Update write permissions per ownership changes
RT->>TxMeta: Update executing instruction index
Reviews (1): Last reviewed commit: "Remove pointer field" | Re-trigger Greptile |
| ## Security Considerations | ||
|
|
||
| What security implications/considerations come with implementing this feature? | ||
| Are there any implementation-specific guidance or pitfalls? |
There was a problem hiding this comment.
Security Considerations section is unfilled
The Security Considerations section retains only the boilerplate template questions and contains no actual content. For a proposal of this scale — a new VM ABI, shared memory regions across instructions, direct account payload mapping, and new privilege-elevation checks for CPIs — leaving this section empty blocks any meaningful security review. At a minimum it should address: (1) how privilege escalation via sol_invoke_signed_v2 is prevented when the program writing to the CPI scratchpad is not the current executor, (2) what prevents a program from crafting an InstructionAccount entry that references an account index outside its instruction scope, and (3) memory-safety implications of the set_buffer_length syscall if the new length exceeds the backing allocation.
| status: Idea | ||
| created: 2025-02-23 | ||
| feature: TBD | ||
| extends: SIMD-0219 |
There was a problem hiding this comment.
extends: SIMD-0219 references a non-existent proposal
SIMD-0219 does not exist in this repository. The Motivation section states "Direct mapping of the account payload data is enabled by SIMD-0219," making SIMD-0219 a hard prerequisite for this design. Referencing a proposal that hasn't been filed creates an incomplete dependency chain — reviewers have no way to evaluate whether the foundation this SIMD builds on is sound or accepted.
| #### Instruction payload area | ||
|
|
||
| For each instruction, the runtime must map its payload at address | ||
| `0x10800000000` plus `0x100000000` times the index of the instruction in the | ||
| transaction. All instruction payload mappings are readonly. | ||
|
|
||
| One extra writable mapping must be created after the last instruction payload | ||
| area to be the CPI scratch pad, i.e. at address `0x10800000000` plus | ||
| `0x100000000` times the number of instructions in the transaction. Its purpose | ||
| is for programs to write CPI instruction data directly to it and avoid copies. | ||
|
|
||
| #### Instruction accounts area | ||
|
|
||
| For each instruction, the runtime must map an array of `InstructionAccount` | ||
| (as previously defined) at address `0x14800000000` plus `0x100000000` times | ||
| the index of the instruction in the transaction. This mapped are is readonly. | ||
|
|
||
| Each of these memory regions contain the following for each instruction: | ||
|
|
||
| - For each account in instruction: | ||
| - `InstructionAccount`, consisting of: | ||
| - Index to transaction account: `u16` | ||
| - Signer flag: `u8` (1 for signer, 0 for non-singer) | ||
| - Writable flag: `u8` (1 for writable, 0 for readonly) | ||
|
|
||
| One extra writable mapping must be created after the last instruction accounts | ||
| area to be the CPI scratch pad, i.e. at address `0x14800000000` plus | ||
| `0x100000000` times the number of instructions in the transaction. Its purpose | ||
| is for programs to write CPI accounts directly to it and avoid copies. |
There was a problem hiding this comment.
Implicit 63-instruction total cap is undocumented
The instruction payload area starts at 0x10800000000 and the instruction accounts area starts at 0x14800000000, a difference of 0x4000000000 = 64 × 0x100000000. This means the layout physically supports at most 63 instruction payload slots plus 1 CPI scratchpad before the two regions would overlap. The same 64-slot constraint applies symmetrically to the instruction accounts area (capped by the sysvar area at 0x18800000000). The spec tracks "Total number of instructions in transaction (including CPIs and top level instructions)" but never states this hard cap. With Solana's existing CPI nesting depth of 4 and many top-level instructions, worst-case totals can exceed 64 entries.
| When the CPI returns, the runtime must do the following: | ||
|
|
||
| 1. Update the address for the CPI scratchpad, and keep the previouly used one | ||
| in its exsiting address assigned during CPI call. The new CPI scratchpad | ||
| address is the same as the previous one plus `0x100000000`. | ||
| 2. Change the read and write permission for the account payload regions, |
There was a problem hiding this comment.
CPI-return scratchpad advance logic is ambiguous
Step 1 states: "The new CPI scratchpad address is the same as the previous one plus 0x100000000." It is unclear what "the previous one" refers to — the caller's pre-call scratchpad address or the callee's scratchpad address. With sequential CPIs, the scratchpad address keeps advancing, consuming slots from the instruction payload area. Combined with the undocumented total-instruction cap, a program making many sequential CPIs will silently exhaust the available slots. The spec should define whether this advancement is intentional and what happens when the caller's scratchpad address reaches the boundary of the instruction accounts area.
| 6. Update the address for the callee CPI scratchpad, the index of current | ||
| executing transaction, and the number of instructions in transaction at | ||
| address `0x400000000`. |
There was a problem hiding this comment.
Typo: "executing transaction" should be "executing instruction" — the transaction metadata area stores the "Index of current executing instruction" (line 79), not a transaction index.
| 6. Update the address for the callee CPI scratchpad, the index of current | |
| executing transaction, and the number of instructions in transaction at | |
| address `0x400000000`. | |
| 6. Update the address for the callee CPI scratchpad, the index of current | |
| executing instruction, and the number of instructions in transaction at | |
| address `0x400000000`. |
| - For each account in instruction: | ||
| - `InstructionAccount`, consisting of: | ||
| - Index to transaction account: `u16` | ||
| - Signer flag: `u8` (1 for signer, 0 for non-singer) |
|
|
||
| 0. Clock | ||
| 1. Epoch rewards | ||
| 2. Epoch Schdule |
| execution. | ||
| 4. Register R4: A pointer to the instruction payload of the instruction under | ||
| execution (see section [Instruction payload area](#instruction-payload-area)). | ||
| 5. Register R5: The payload lenght for the instruction under execution. |
No description provided.